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^ ■ Pavlov, a well-known strategy in game theory, has been shown to have some advantages 
in the Iterated Prisoner's Dilemma (IPD) game. However, this strategy can be exploited by 

OO ■ inveterate defectors. We modify this strategy to mitigate the exploitation. We call the resulting 

I strategy Rational Pavlov. This has a parameter p which measures the "degree of forgiveness" 

^ of the players. We study the evolution of cooperation in the IPD game, when n players are 

f-H I arranged in a cycle, and all play this strategy. We examine the effect of varying p on the 

■ convergence rate and prove that the convergence rate is fast, 0{n\ogn) time, for high values of 
• I p. We also prove that the convergence rate is exponentially slow in n for small enough p. Our 

Q ■ analysis leaves a gap in the range of p, but simulations suggest that there is, in fact, a sharp 

'—'I phase transition. 
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^ ! 1 Introduction 

00 . 

1.1 Overview 

(N . 

■ The Prisoner's Dilemma (PD) is one of the most famous strategic games in game theory (see, 
\ for example, [13'). This game is widely used as a prototype for the study of the evolution of 

" . I • cooperation among selfish agents. It has attracted a large amount of interest from researchers 

I in diverse fields, due to the fact that it represents a very common strategic situation that needs 

k> ■ to be understood. Many real-life problems can be modelled by the Prisoner's Dilemma [3]. 

I In the standard form of PD, the payoff obtained when both prisoners cooperate with each 

■ other is denoted by R, the reward for mutual cooperation. The payoff gained when both defect 
is denoted by P, the punishment for mutual defection. Finally, T (the temptation to defect) 
is earned by the informer and S (the sucker's payoff) is earned by the other when one defects 
and the other cooperates. These four outcomes are shown in Figure [T] in a matrix form. In 
this game, the payoff of an action does not depend on the player. Hence the game is said to be 
symmetric. 

Column Player 

^ ^ Cooperate Defect 

iS Cooperate 
^ ^ Defect 



= 3 


5 = 


r = 5 


p = 1 



Figure 1: The payoff matrix for the Prisoner's Dilemma with Axelrod's numerical example. 
The game is symmetric, therefore only the payoffs to the row player are shown. 

The interesting feature of the PD game is the property that the four payoffs satisfy: T > R > 
P > S. Hence, in one shot game, it is always best to defect. Thus, self-interest of the players 
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leads to the payoff P which is worse than the R that both players would get by cooperating, hence 
the dilemma. In the Iterated Prisoner's Dilemma (IPD), the same players meet again with a high 
probability, thus getting an opportunity to punish each other for any previous non-cooperative 
moves. Understandably, the fear of retaliation here is likely to encourage cooperation. This was 
studied in fP , which stimulated work in this area. Another constraint 2R > S + T is usually 
added to the standard form of the IPD 11 . If this constraint is not present, players could benefit 
from receiving S and T on alternate rounds rather than R on every round through continuous 
cooperation. 

A great deal of research has been done to find out an ideal strategy for the IPD. A strategy 
helps players decide whether to cooperate or defect in the current round. A simple strategy called 
tit-for-tat (TFT) surprisingly won Axelrod's seminal computer tournaments [1] . TFT cooperates 
on the first round, and copies what the opponent has done on the previous round thereafter. 
However, this strategy has two main problems: firstly, it is not evolutionarily stable [212]; and 
secondly, any mistakes by the agents or any noise in the responses may cause a misinterpretation 
leading to irrecoverable retaliation sequences. (Informally, a strategy is evolutionary stable if a 
population of players adopting the strategy can not be overrun by any mutant strategy.) 

Another well-known strategy is called Pavlov. The Pavlov, an exemplar of the win-stay 
lose-shift strategy, works as follows. On each iteration of the game, if a Pavlov player's payoff 
is one of the two smaller payoffs, i.e. P or S, then he switches his action in the next round 
of the game, otherwise he keeps the same action. It is claimed in [71 [TOl US] that the Pavlov 
performs better than TFT. This is due to its ability to recover from noise and errors and its 
capability to exploit unconditional cooperators (All-C). However Pavlov has two main issues. 
Firstly, Pavlov is deterministic, thus it cannot represent uncertainties present in the real world, 
such as the stochastic nature of biological interactions [15]. Secondly, it fares poorly against 
all-time defectors (AU-D). This is because, when played against All-D, Pavlov is punished for 
defecting, so switches to cooperation, just to be punished even more. This is repeated forever, 
and consequently Pavlov collects the sucker's payoff (5) on alternate rounds. 

A family of stochastic Pavlovian strategies 'P{k,i), for a fixed i and < k < £, has also 
been studied and hailed as a near-ideal strategy for the IPD in [10]. P{k,£) cooperates with 
probability k/£. At the end of each round of the game, k is increased if the player gains T 
or R, and decreased otherwise. The advantages of these strategies are: they are adaptive and 
naturally stochastic. The disadvantages are: they take exponential time in £ for learning to 
cooperate and are exploitable by All-D. It is worth to mention that 7^(1, 1) is equivalent to the 
Pavlov strategy described above. 

Before we move on, let us represent the Pavlov strategy as a (deterministic) Markov chain. 
Suppose two agents play the IPD using Pavlov. This can be modelled as a Markov chain 
having four states, each representing a possible combination of the strategies of the agents. We 

denote these states by 4--I-, H — , — h and . (Here -I- stands for cooperation and — stands for 

defection.) Thus, H — h, for example, represents the scenario where both agents cooperate. The 



transition diagram for this process is shown in Figure 2(a) 



1.2 Rational Pavlov strategies on IPD 

It is now clear that the main weakness of the Pavlov is that it can be exploited by All-D. Thus 
we suggest an enhancement to this strategy. We modify it to add randomness. This, we think, 
makes the resultant strategy more rational and robust. The details of the modification are given 
below. 

A Pavlov player cooperates in the current round if both he and his opponent cooperated or 

defected at their previous play. Thus, a transition from to H — h happens in a single repetition 

with certainty. We will modify this in two ways, so that the transition from to H — h will only 

happen with some probability less than 1. More precisely, the modifications introduced to the 
to H — h transition are: if both players defected , i.e. in state , in the previous play, then 

1. each player decides independently whether to cooperate in the current round with proba- 
bility p. The transition diagram of the strategy obtained after this modification is shown 
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in Figure 2(b) As we believe that this modification adds some rationahty to Pavlov we 
call this strategy Rational Pavlov (RP)- 

both cooperate in the current round with probability p. The transition diagram of this 
strategy is shown in Figure 2(c)[ This is a simplified version of the RP, hence the name 
Simplified Rational Pavlov (SRP). Even with the absence of communication, players decid- 
ing together with probability p can also be justified using the superrationality principle [6]. 
Thus, SRP might also be expanded as Super Rational Pavlov. 

It is noteworthy that both RP and SRP are equivalent when p = 1 or p = 0. And, both 
RP and SRP reduce to the original Pavlov strategy when p = 1. 



Ill 




(1-P)2 (l-p) 
(a) Pavlov Strategy (PS) (b) Rational Pavlov (RP) (c) Simplified RP (SRP) 



Figure 2: Transition diagrams of the original and the modified Pavlovian strategies. Here, "— " 
represents cooperation and "+" represents defection. The transition probabilities are shown on 
the edges. 



1.3 Previous work 

Although our work appears to be the first to formally define strategics like RP and SRP, there 
is some evidence in the literature that support our intuition behind the proposed improvements. 
Firstly, the results from experiments with humans in |20| overwhelmingly support this. The 
results show that humans use a Pavlov-like strategy that is smarter than the classic Pavlov 

strategy when dealing with All-D. This Pavlov-like strategy cooperates after state with 

probability less than 1, like RP and SRP do. Not surprisingly, the players using this strategy 
were more successful than the others in the experiments. Furthermore, a similar modification 
has been suggested as a possible improvement of the Pavlov in jlO] . Finally, a strategy similar 
to RP and SRP has proved to be the winner in computer simulations as well [5] . 

Apart from reigning supreme in evolutionary game theory, the Pavlov has been studied in 
distributed Artificial Intelligence as a learning model. Shoham and Tennenholtz |19j introduced 
the notion of co-learning where agents try to adapt to their environment by adapting to one 
another's behaviour. In the same paper, they also defined a simple co-learning update rule, 
namely Highest Cumulative Reward (HCR). This rule states that an agent should adapt to the 
action that resulted in favourable feedback in the latest fj, iterations, where /i is the memory 
size. The HCR update rule ensures that cooperation emerges at the end in the IPD game. This 
update rule with /i = 0, which is one of the most efficient memory sizes [5], is precisely the 
Pavlov strategy. 

Shoham and Tennenholtz studied the evolution of cooperation for the HCR update rule 
in unstructured population and concluded that it is an impractical model for the evolution of 
cooperation. This conclusion is not surprising, as it is now well known that, in an unstructured 
population, natural selection favours defection over cooperation |17j . Hence, there is a growing 
interest in studying the evolution of cooperation when the topology for interactions is not 
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complete (see, for example, [TSl HZ]). Thus we consider the players to be arranged as the 
vertices of a graph, and they can interact only along the edges of the graph. Kittock 9^ studied 
the effects of an interaction graph on the emergence of cooperation under the dynamics in which, 
at every step, two adjacent players are selected uniformly at random to play the IPD game using 
the Pavlov strategy. The paper [9] presented the results from an empirical study which shows 
that the time needed for the emergence of cooperation in the IPD game is polynomial on cycles 
and exponential on complete graphs. 

Most of the work we have mentioned above is empirical. However the need for rigorous 
results has rightly been emphasised. (See, for example, [9l [TTJ [19] . ) The reason for the lack 
of rigorous analysis of games on graphs is that it is complicated due to the vast number of 
patterns that can be generated [M]. While the empirical results do give some insights into 
the evolution, some of the results are far too complicated to be understood without theoretical 
backing. The results obtained through rigorous analysis are often more revealing and contribute 
to a clearer understanding of the problem . Hence, in this paper, we analyse the behaviour of 
RP rigorously. More precisely, we establish the conditions for fast convergence, and determine 
the rate of convergence to cooperation when all players play RP. These measures are central to 
understanding the emergence of cooperation among selfish agents [1] . 

On the theoretical side, Dyer et al. j4j studied the two cases examined in [9], using rigorous 
analysis. Mossel and Roch [l^ did a similar study for some expander graphs and bounded degree 
trees and showed that the convergence is slow in both settings for the Pavlov strategy. Istrate 
et al. [8] investigated the robustness of these convergence results under adversarial scheduling 
in which an adversary selects which players update at every step. Their results show that if an 
adversary can specify two players for the update, the game might never converge. Along this 
line of work, we carry out a rigorous analysis of RP in this paper. In particular, we attempt 
to find the range of p that favours fast evolution of cooperation and the range of p that makes 
the evolution of cooperation exponentially slow, when the IPD is played on the cycle using RP. 
(Here, we consider speed of convergence as a function of the number of players, n.) All our 
results are complemented by simulation results. Our choice of graph, the cycle, is an extreme 
case, where every player has only two neighbours. Game dynamics have previously been analysed 
for the cycle [4j[9l[T7]. Our results show some interesting results, for instance, we show that 
the emergence of convergence is exponentially slow for small values of p. Thus a high degree of 
forgiveness seems necessary for cooperation to emerge. Perhaps, our most important message is 
that a Rational Pavlov player can reduce the risk of being exploited without compromising the 
emergence of cooperation. 

We have analysed SRP as well. The analysis is quite similar to that of RP. Therefore, we do 
not present it in this paper. Instead, we make some remarks on the final results under relevant 
sections. 

1.4 Preliminaries 

Much of the notation and terminology used in this paper is adopted from 4^ . We consider n 
players arranged as the vertices of a cycle graph G — {V, E), where ^ = {0, . . . , n — 1} and 
E = {{i, 1 + 1} : 0<z<n — 1}. Hence, vertex i can interact only with the vertices i — 1 and 
i + 1. Here and throughout the paper, addition and subtraction on vertices is performed modulo 
n. 

The agent at the vertex i {0 < i < n) has a state Si S {—1,1}, where —1 represents defection 
and 1 represents cooperation. We will denote the cooperator-states, or I's, also as +'s (pluses), 
and the defector-states, or —I's, as — 's (minuses). Each edge of the graph has a state which is 
determined by the states of its end vertices. Thus, an edge of the graph might be in any of four 
states, , — h, H — , ++, as shown in the state transition diagrams in Figure [2] 

In this study, the game is played in the following way. At each stage, an edge of the cycle is 
selected uniformly at random. The agents connected by this edge play the game using RP and 
update their strategies accordingly. In this process, emergence of cooperation means reaching 
the state where everyone cooperates, in other words, reaching the state S* with Si = 1 for all 
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i G V. The state S* is the unique absorbing state of this process. 

We will use the following terminology. Let S £ {—1, 1}^ be given. A plus-run (resp. minus- 
run) in S is an interval where < i,j < n, such that Sk = 1 (resp. —1) for i < k < j 
and Si-i = —1 (resp. 1), S'j+i = —1 (resp. 1). (It is possible to have j < i, since we are 
working modulo n.) Clearly all runs are disjoint. The length of a minus-run i?d, denoted by 
£{Rd)^ equals the number of minuses in the run. We will refer to a minus-run of length £ as 
an £d-run where the subscript "d" stands for defectors. We use similar variables for a plus-run 
with subscript "c" , which stands for cooperators. 

We now give some definitions for minus-runs, which are equally applicable to plus-runs if 
the signs are changed, and the subscript c is used. A Id-run is also called a singleton minus, 
and a 2d-run is also called a pair of minuses. There are two outer rim edges associated with a 
minus-run i?d ~ namely {i — 1,?} and {j, j -|- 1}. The all-minuses configuration is not a 

run as we have defined it, since it has no bordering pluses, we will nevertheless refer to it as the 
rid-run. 

Finally, the parameter of both RP and SRP will be denoted by p, but the context should 
always make the meaning clear. The following theorems summarise our results. 

Theorem 1. Suppose n players, arranged as the vertices of a cycle, play the IPD game using 
Rational Pavlov (RP) with parameter p > 0.870. Then, there is a constant uj > such that, the 
probability that the all- cooperate state is not reached in time 



is at most e, for any e > 0. 

Theorem 2. Suppose n players, arranged as the vertices of a cycle, play the IPD game using 
Rational Pavlov (RP) with parameter p. Suppose all players play defect when the game is 
started. Then there exists a constant pi > such that, for all p < pi, it takes time exponential 
in n for the all-cooperate state to be reached, except for probability exponentially small in n. 

Theorem 3. Suppose n players, arranged as the vertices of a cycle, play the IPD game using 
Rational Pavlov (RP) with p = 0. Provided there is at least one defector on the cycle at the 
beginning of the game, the game converges to defection in time Tn where Tn lies within the range 



Remark 1. In this paper, an event y„ which depends on the size of the graph n is said to happen 
with high probability, or in short w.h.p., to mean that Pr(l'„) — >■ 1 as n — > cx). 

The outline of the rest of this paper is as follows: In Section [21 we derive the conditions 
for fast convergence to cooperation when RP is used on the cycle. In Section [3l we prove that 
convergence to cooperation is slow for small values of p. Section 2] concentrates on a special 
case where defection emerges fast on the cycle. Experimental results are presented in Section [5j 
Finally, Section [6] presents our concluding remarks. 

2 Fast convergence on the cycle 

The convergence rate of the IPD has been analysed in i4| by finding a nonnegative integer- valued 
potential function ^ : {-1, 1}^ R such that ^(S*) = when S = S* and ^(S") > otherwise. 
Then, Dyer et al. [4] proved that the expectation of the function ^, which measures the distance 
from the absorbing state S* to any given state S, decreases with non-null probability till the 





5 



absorbing state is reached. We use a similar approach here, but with a simpler potential function 
0(5'). This function is defined as 



/ m ''f is the number of i^d-runs, and 

cj){S) = }_^weri, where ^ n • fi. • i.. f ^ 

wi > u IS the weight ot an ifd-run. 

Note that 0(5) — when S — S* and 0(5) > otherwise. In this section, we prove Theo- 
rem [1] which shows that the emergence of cooperation is fast when using RP in the IPD for high 
values of p. This is done by studying the changes in the total weight of the minus-runs. Hence, 
in this section, a run means a run of minuses unless otherwise stated. 

2.1 Analysis 

We first consider the minus-runs that are separated from their adjacent runs by at least two 
pluses. When two minus-runs are separated by a singleton plus, choosing the outer rim edges 
of the singleton plus causes the two runs to merge together. This case therefore needs some 
special consideration and is addressed at the end of this section. 

We need to show that the expectation of decreases after every iteration of the game. This 
requirement can be modelled by having a constraint that the expected total weight of the runs 
created by hitting an overlapping edge of an Id-run {£ = 1,2, ... ,n), denoted by E[s^], is strictly 
less than the original weight W£. We will now consider runs of different lengths in turn, and find 
the corresponding constraint. 

A Id-run. For a Id-run, there are only two edges which overlap this run. Choosing either of 
these edges will produce a 2d-run. Therefore, the Id-run can be handled by adding the following 
constraint to the formulation. 

E[si] = i(2w2 + (n-2)u;i) < {1 - S)wi, 

for small S > 0. Let S = uj/n. Thus we obtain 

2w2 - (2 - Lj)wi < . (2) 

An £d-run, where 2 < £ < n — 1. There are £ + 1 edges which overlap this run. Two of them 
are outer rim edges, and selecting either for the update causes the run to grow in length by 1. 

All other £ — 1 overlapping edges are in state . Let us number these edges 1, 2, . . . , £ — 1. 

According to the strategy RP, if the edge i £ {1,2, . . . , £ ~ 1} is chosen for the play, this edge 
will become H — h with probability p^, producing a (i — 1,£ — i — l)-split. Similarly, this edge 
might go to the state -I — or — h with a probability oi p{l—p), resulting in a (i — 1, i* — i)-split or 
a {i,£ — i — l)-split respectively. The edge might also remain in the same state with probability 
(1 — p)^. Finally, there is a chance of not hitting any of the overlapping edges of the run, leaving 
the ^d-iun intact. We can now compute the expected new weight of the run after one step of 
the game, by combining these cases. Hence we have 

^ , e-1 e-1 

'^[se] = - ( 2wi+i +p^ ^^{W'l-i + wf-t-i) +p{l -p) ^(wj-i + W£-j)+ 
^ ^ 1=1 1=1 

p{l - p) + w^^-.-i) + (1 - P)'{£ - l)we) + ^i^^±llu;, < (1 - S)we . 

i—l ^ 

This inequality can be simplified to 

e-1 e-1 
2wi+i +p^^{wi^i + W£_.,_i) +p(l -p)^(wi_i +wt_,) + 

i=l i=l 

e-1 

P{1 - P)'^iw^ + Wi-^-i) + (1 - p)^{£ - l)wi <{£+l-' Uj)wi . 
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Hence, we have 



1-2 ,l~2 l-l s 

2wi+i + 2p2 '^Wi + 2p{l - p) f ^ w,; + ^ j + (1 - p)^{l - l)we <{£ + !- uj)we . 

i=0 ^i=0 i=l 

Thus, 

2wt+i + 2p2 ^w, + 2p{l -p)i2^w, 

i=0 ^ i=0 

(1 

Let Wo = 0. Then, for 2 < ^ < n — 1, we have 

1-2 

2wi+i + 2p{2 -p)^w, + 2p{l - p)wi-i + {l{p^ - 2p) - (p2 - 2p + 2) + cj) < . (3) 

The rtd-run. For the Tid-run, choosing any edge wiU cause the run to decrease m length by 2 
with the probabihty p^ , to decrease in length by 1 with probability 2p(l — p), and to remain the 
same with probability (1 — p)^. Thus we obtain 

-{p^Wn-2 + 2p{l -p)Wn-l + (1 -p)^W„)r7, < (1 - S)Wn ■ 

n 

Simplifying this inequality yields 

p'^w„^2 + 2p(l - p)w„_i + (p2 -2p + 5)w,, < . (4) 

Finally, consider the case where two adjacent runs are separated by a singleton plus. Suppose 
the lengths of these runs are (£i — 1) and £2 ■ If we delete the singleton plus which separates them, 
a run of length £1 + £2 is created. Let us count this as two runs of length £1 and £2- In other 
words, we calculate the resulting weight as Wi-^ -\-Wi^ whereas the true weight is wi^+i^. We need 
to know that this underestimates the true cost. This can be done by adding the inequalities 

Wl^ + > W£i+£2 . (5) 



- Wo + Wl-l I + 

-pf {£-1)1111, < (e + l-uj)we 



2.2 Determining the weights 

We now show that we can find appropriate values for the weights wg satisfying inequalities ^ 
to (O . This will imply that the expectation of the total weight of the runs in a cycle decreases 
in expectation after every iteration of the game, leading to a fast (polynomial) convergence rate. 
We also determine a range of p favouring fast convergence. 
Solving for wi+i in inequality ([3]) gives the recurrence 

Wi+l=~p{2-p)Y,w^-p{^-p)m-l-^{£ip^-2p)-{p''-2p+2) + Sn)we , (6) 

1=0 

for 2 < ^ < n — 1. And, from ([2]), we have 

W2 = (1 - ■ (7) 

Define g{£) by 
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(a) When p > 0.870, as £ increases, g{£) 
decreases initially and then increases ex- 
ponentially. When p < 0.869, g{£) de- 
creases continuously. 

Figure 3: Results from a C-|--t- program: (a) 
and (b) experimental values for ig. 



(b) When p is less than 0.870, no lo- 
cal minimum was detected, thus no 
valid value of £o shown in the figure. 
It can be noted that £o < 8 when 
p > 0.870. 

p values plotted as a contour for g{£) versus £, 



A computational study suggests that there exists apo such that g(£) has a positive minimum 



for p > po, and g{£) decreases monotonically for p < pq. This is summarised in Figure 3(a) 
Provided this happens, suppose the minimum value a occurs at £ = ioip) for p > pq. We will 
write £o{p) simply as £o for notational simplicity. (See Figure [3 (b)[ ) Then 

a = g{£o) = ■ 

We use this property of the function (;(£), i.e. having a minimum for high p values, to define 
the weights Wi. Lemma |4] below gives the proof of existence for £o. Now, for p > pq, define the 
weights of the runs as 

^ ( We if ^ < 4, /g-v 
[ a£ otherwise, 

where a is a local minimum of the function g{£). The following lemma proves the validity of 
the assumption that g{£) has a minimum when p > po- 

Lemma 4. There exists po < 0.870 such that g{£) has a minimum when p > pq. 

Proof. To prove the lemma, we determine polynomial functions of p satisfying the first few 
terms of the recurrences ([S]), with seeds = and w)i = I. We use these to find inequalities 
which determine the range oi p such that g(£) has a minimum. We then solve these numerically. 
In fact, we use the decrease in g{£) at a given £, which we denote by h{£). That is, 

hi£)^gi£+l)^gi£) . 

Thus, if g{£) has its first local minimum at £ = £q, h{£) will be negative for ^ = I, 2, . . . , ^ 1 
and positive at £ = £q. For simplicity, we assume that a; = in the calculations that follow. 
Now, solving ([6]) and ([7]) for h{£), with wq —Q and wi = I, we obtain: 

1. h{l) = -i. Hence, h{l) < for aU < p < 1. 

2. h{2) ^-1 + Ip^- Hence, h{2) < for all < p < I. 

3. hii) = -±-\p+ Ap2 „ ipi ip3_ jjence, h{3) < for all < p < 1. 

4. h{4) ^-1^-^p- |p2 + 21^3 ^ 11^4 _ ^ |.p6^ Hence, /i(4) < for < p < 0.897. 

Continuing in this way, as t goes from 5 to 10, the range of p for which h{£.) < becomes 
gradually smaller: 

• h{5) < for < p < 0.877. 



• h{6) < for < p < 0.871. 

• h{7) < for < p < 0.870. 

• h{8) < for < p < 0.869. 

Here the upper bounds for p are rounded to three decimal places. Note that h{8) is positive 
if p > 0.870. Therefore, ii p > 0.870, h{£) is negative for 1 < ^ < 7 and positive for ^ = 8. Thus 
decreases up to ^ = 8 and increases at £ = 9. Hence, by definition, £o < 8 when p > 0.870, 
and the lemma is proved. □ 

Next we prove two properties of the function wg, which will be used later in the proof. 
Lemma 5. Wi is a non- decreasing sequence. 

Proof. Recall that wg = wg for £ < £o. Furthermore, from Lemma HI we know that £q < 8 for 
P > Po- Hence, we first prove that wg is increasing up to £o — 8, by proving that wg is increasing 
as £ goes from 1 to 8. This is done by solving the recurrences ([6]) and (O, with seeds wq = 
and wi — 1. Here also, for simplicity, we assume that uj — 0. Let f{£) be defined by 

fi£) = wg+i - wg . 

Then it suffices to show that /(^) > for ^ = 0, 1, . . . , 7. But we have 

1. /(O) = 1. Hence, /(O) > for all < p < 1. 

2. /(I) = 0. Hence, /(I) > for all < p < 1. 

3. /(2) = ip2. Hence, /(2) > for all < p < 1. 

4. /(3) = -p + p2 +p3 - ip4^ Hence, /(3) > for 0.689 < p < 1. 

5. /(4) = -2p - |p2 + ^p3 + |p4 + |p6 _ Hence, /(4) > for 0.805 < p < 1. 
Likewise, we obtain 

• /(5) > for 0.850 < p < 1. 

• /(6) > for 0.865 < p < 1. 

• /(7) > for 0.869 < p < 1. 

All these together show that wg is increasing in the range £ — 1,2, . . . ,£a for 0.869 < p < 1. 
The lemma then follows from the definition that wg is increasing when £ > £o. D 

J- is a non increasing sequence. 



Lemma 6 ^ 



Proof. From the definition of ^q, ^ is decreasing for 1 < £ < ^o- Moreover, wg — wg for £ < Iq. 
Therefore ^ also decreases for 1 < £ < £o. When £ > £o, we have ^ — a which is obviously 
non-increasing. □ 

The following lemmas show that the inequalities ([2]) to ([5]) are satisfied by the proposed 
weights. 

Lemma 7. The weights wg defined in (0), with seeds wq — and wi — I, satisfy inequalities (0) 
and 

Proof. Let k > £q. Then, from we get 

Wk+1 < -p(2 ~p)J2^^- P(l - P)wk-i - 2 (^(P^ - 2p) - (p2 - 2p + 2) + Wfc . (9) 

1=0 

Let 

Ci — Wi — ia . (10) 
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Then, substituting and (fTU)) into we obtain 

fc-2 

{k + l)a<- p{2 ~ p) ^(ia + c) - p(l - p){ik - l)a + Ck-i] 

i=0 

- \{Hp^ - 2p) - {P^ -2p + 2)+u)ka . 



Simplifying yields 



a{p~^uj) 



If fc = ^i(> 4), we get 



a{p-\Lo) 

But, from ([8]) and (flOl) . we know that = for fc > £o- Since £i > io, we have = 

and I] -io^ = X^ilo^ Ci. Hence 

^1 + + f7\^-°'^' ^^t (say). 

Thus, inequality © holds for ^ > We also know that © holds when ^ < by dH]). 
Therefore, showing £1 — £o < 1 will mean that ([3]) is true for all values oi £. To do this, we 
determine a lower bound on (q by substituting k — £o into ([TT|). Thus 

. ^ g + gp + p(2-p) E^lo^c, _ 

«o ^ 7 J — N — to (,sayj. 

a{p-^u} 

Hence we have 

p{2 - _p)Qo_i - p{l - p)Qo_i 



g(p- iw) 



2i°—L^ since 7 — — > 1 when a; — > 0. 
g (,P-2'^j 



Substituting (jlOl) into the above inequality yields 

Wia^l - g(4 - 1) 



= 1 - 



a 



g 

< 1, since > wi„-i from Lemma [5j 

Finally, it can easily be verified that ^ holds with wi — Wi = 1 and W2=W2 = 1 — w/2, 
completing the proof. □ 

Lemma 8. The weights we defined in (0), with seeds wq ^ and tii = 1, satisfy inequality 

Proof. Assume ^ 8 and n > 10, so wi — £a for £ = n — 2,n — l,n. Then we require 

p^(n - 2)g + 2p(l -p)(7i - l)g+ - 2p+ ?ig < 0, (12) 

which simplifies to 

uj < 2p, 

so is satisfied for small enough w. □ 
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Lemma 9. The weights wg defined in with seeds wq = and wi = 1, satisfy inequality ^Bj). 
Proof. 

wi^ +wi^ > ii-r- +t2-r- ■ 
But, from Lemma |51 ^ is a decreasing sequence. Thus, 

proving the lemma. □ 

Proof of Theorem [TJ Consider an ^d-run where 1 < t < n. Denote by E[sf] the expected 
weight of the resuhing runs. Then, we have 

EN < {l-5)wi, 

for p > po, since the weights (|5]) satisfy the constraints to ([5]) by Lemma [71 Lemma [5] and 
Lemma [5) Furthermore, Lemma H] showed that the weights in RP can be defined this way for 
p > 0.870. 

Now, suppose the initial state of the cycle is So ■ Let 5*4 be the resulting state of the cycle 
after t steps. Then total weight after one step of the game is therefore 

E[0(5i) I So]=Y.nsi]ri 

e 

< ^(1 - S)wiri 

i 

= (1 - <5)(/.(5o) . 

Thus, by total expectation, we have 

E[0(5i)] < {1 - 5)E[cj,{So)] . 

We have (j){S) < n. To see this, note that for an fd-run, wi = 1 > wi/i = we/i > a when 
1 < £ < Iq. Thus a£ < < £ when 1 < £ < £q. In particular, this implies a < 1. When £ > £o, 
we have wi — a£ < £, implying wg < £ for all 1 < ^ < Summing this over all runs in S, we 
have </>(S') < n. Since 6 = lj/h, we have 

E[0(^i)] < (i - ^) nnso)] ■ 

Applying this for t steps, we obtain 

E[(/.(5t)] < (l--)*E[(/.(5o)] < fl--)*n < e-fn < e, 



when 

n , /n\ 
t > -log - . 

We also know that, for any S ^ S*, (piS) > > 1 by Lemma O Thus, using Markov's 
inequality, we obtain 

PiiHSt) ^ 0] = PiiHSt) > 1] < E[0(5O] < 

and the theorem is proved. □ 

Remark 2. The above requires satisfying ([2]) to ([5]). These are all linear inequalities. Therefore, 
we can solve them by linear programming. Initially, we solved the problem this way, obtaining 
the same results as above. 

Remark 3. The problem for SRP can be formulated and solved in the same way as described 
in Section [2] for RP. We did this and found that the convergence to cooperation is fast when 
p > 0.699. So, the range of p for which the convergence is fast is bigger for SRP than for RP. 
This is somewhat expected because, for a given p, SRP is more forgiving than RP. 
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3 Slow convergence on the cycle 



In Section [21 we proved that the IPD game converges to cooperation fast for high values of p. It 
raises an interesting research question: how fast or sfow is the convergence when p is smah? In 
this section, we answer this question by proving Theorem [51 which shows that the convergence 
to cooperation takes time exponential in n for small enough p. The idea of the proof is to show 
that it takes exponential time for a plus-run of length n,{n) to be formed. (This is done by 
analysing plus-runs on the cycle. Therefore, in this section, a run refers to a run of "pluses" 
unless otherwise.) It obviously follows that it takes exponential time for the all-cooperate state 
to be reached. 

3.1 Problem formulation 

Let Ti-\{t) denote the event that a run of £ pluses (an i'c-run) starts at position i at time t, i.e. 
S'fe = 1 for ^<k<^ + ^— 1 and Si-i = Si+i = — 1. By the symmetry of the cycle, and the 
initial configuration, Pr(7?.^) will be the same for all i. Let 5j{t) denote the event that Sj is a 
minus at time t. i.e. 5j[t) ~ {Sj ~ —1}. We wih write 'R-\{t) and 5j{t) simply as TZ^ and 6j 
respectively, to ease the notation. Then we will define Pj (£ = 0, 1, . . . , n — 1) to be 

P/ = Pr(7^^ I (i = 0,l,...,n) . 

The conditioning on means that the probability is an upper bound on 'Pv{TZ\). This 
follows since, if S'j„i = -1-1, a plus-run cannot start at i. Recall that £ = means the length 
of the plus-run is 0. Hence, in particular, Pg is an upper bound on the probability that there 
are minuses at positions i — 1 and i. An advantage of this approximation is that the Pf are a 
probability distribution for £ = 1, 2, . . . , n, whereas the quantities V-!:{TZ\) do not sum to 1 in 
general. 

Later in the proof, we will need to calculate an upper bound on the probability that two 
plus runs are separated by two minuses. That is, we need to calculate an upper bound on the 
joint probability Y't{JZ\ A 7?.^) where i = j + m + 2. But, we have 

Pr(7^^ A 7^^J = Pr(7^^ I 7^^„) Pr(7^^„) . (13) 

We will use the fact that, conditional on the Sq for q > r and the Sk for k < r are independent, 
if the vertices k and q belong to different plus-runs and there is at least one more plus-run on 
the cycle. Under this condition, changes to the Sq occur independently from those to the Sk, 
since all steps are independent and affect only two adjacent vertices. The structure of the cycle 
means that changes to the Sq can only be percolated to the Sk through the vertex i, on which 
we have conditioned. Thus, given Si-i, the Ti\ is conditionally independent of the Ti-in, provided 
there is at least another plus-run on the cycle. The assumption of having at least three plus-runs 
holds initially because the game is started with all-minuses, which means there are n 0-runs of 
pluses. Moreover, we will then show that it takes exponential time for a plus-run of length n/A 
to be formed. To summarise, we may assume 

Pr(7^^|7^fJ-Pr(7^^|^,„l). (14) 

Therefore, from (fTB]) and we have 

Pr(7^^A7^^„) = Prini 1 5._i)Pr(7e^„) < Prini I ^.-l)Pr(7^^„, 1 5,_i) = p;p,*„ . 

Note that this inequality and the argument are also applicable when one or both runs are of 
length and separated by one minus, i.e. for Pt{TZI a 7^^+'^+l) and Pr(7ej, A TZ}+^). We use this 
below without referring further to the details. 

We do not explicitly determine a pi for which slow convergence occurs. Though this is 
possible in principle with our methods, the simpler approach we have chosen already leads to 
very cumbersome calculations. Our approach, therefore, is to regard p as small, and use the O 
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and o notation to indicate the order of approximations. Thus there will be some small enough 
constant pi for which our results hold, but we cannot estimate it. In order that the O etc. 
notation can be applied to both p and n without confusion, we will assume that n > e^/^. 

We will first consider short runs. For simplicity, we will leave the investigation of a Oc-run 
to the end of this section, and start with Ic-nms. 

A Ic-run. Let be a Ic-run at position i, i.e. Rc = [i, i]- Choosing either of its outer rim edges 
causes to be deleted. On the other hand, Rc is created from a 2(,-run at position i — 1 if the 
edge {i — 2,i — 1} is selected and from a 2c-run at position i if the edge {i + 1, i + 2} is selected. 
In addition, Rc is created from three consecutive minuses at positions (i — l), i, and (i + 1) with 
probability p{\ — p) if either {i — 1, i} or {i, i + 1} is selected. The probability of finding three 
consecutive minuses is at most Pg. Combining all this information, we obtain 

P*+i ^Pl + ^ {-"^Pl + 2P2* + 2p(l - p)Po*) . (15) 

Note that the coefficient of P/ on the right hand side of dTSj) is positive if 71 > 2 and other two 
variables Pg and Pj also have positive coefficients. Hence, using the upper bounds of these three 
variables yield an upper bound for Pl'^^ as required. Also, as we only need an upper bound, 
we have ignored the cases where both i — 2 and i — l are minuses. In that case, choosing the 
edge {i — 2,i — 1} causes the Ic-run to increase in length by 2 with probability p^ and by 1 
with probability p{l — p), effectively deleting Rc. We will perform similar approximations for 
the other runs investigated below, without mentioning these details further. 

The equation (llSp is a difference equation with time step 1. Let us rescale so that the new 
time step is \/n. The difference equation corresponding to the new step size is then as follows. 

1+1 1 1 / 1 1 1\ 

pn « = pn ^ _ ( _2pn ^ 2P^ + 2p{l - p)P^ j . (16) 

Let T — and h — ^. Then the equation (|16p can be written as 

= -2P[ + 2PJ + 2p(l - p)P^ . (17) 

Now, the difference equation (jl7|) can be approximated by the following differential equation, 
with error up to 0{h) = 0{l/n) ~ 0(e^^/P), say, on the right hand side. 

dP^ 

^ = -2P; + 2PJ + 2p{l - p)PS . (18) 

A 2c-run. Let Rc be a 2c-run starting at position i, hence Rc = + 1]. Similarly to a Ic-run, 
choosing either of the two outer rim edges of Rc causes the run to decrease in length by 1, 
reducing the number of 2c-runs on the cycle by 1. Rc is created by choosing the outer rim edge 
{i — 2,i— l}ofa 3c-run at {i — 1). Similarly, Rc is created by choosing the outer rim edge 
{i + 2,1 + 3} of a 3c-run at i. In addition, Rc can be created from a singleton plus adjacent 
to a pair of minuses. This happens if the edge connecting the pair of minuses is selected and 
only the minus next to the singleton plus becomes a plus. The probability for this, given that 
the corresponding edge has been selected, is p(l — p). Finally, Rc is created with probability 
p'^ by selecting the middle edge of four consecutive minuses. The probability of having four 
consecutive minuses at location i is at most Pq . Therefore we get 

P2*+i = ^2 + ^ (-2^2* + 2P3* + 2p(l - p)PlP^ + p'pf) . (19) 
As before, rescaling and approximating, we obtain 



dP^ 
dr 



2 - -2PJ + 2P3^ + 2p{l - p)P[PS + p^PS^ . (20) 
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An ^c-run, where £ > 3. Suppose Rc — is an ^c-run for some i > 3. Selecting either of 
the two outer rim edges causes Rc to decrease in length by 1. On the other hand, an + l)c-run 
starting at position {i — 1) is turned into an £c-run starting at position i if the edge {i — 2, i — 1} 
is chosen; and, an (i + l)c-run starting at position i becomes an ^c-run starting at the same 
position i if the edge {j + l,j + 2} is chosen. Also, if there is a Oc-run at i and an {£ — l)c-run 
at i + 1, choosing the edge {i — will create an ^c-run starting at location i with probability 
p{l — p). We will get the same result if these two runs are in the reverse order: {£ — l)c-nm at i 
and a Oc-run at j. If there is an {£ — 2)c-run starting at (i + 2) and there are minuses at z — 1, i, 
and i + then choosing the edge {i,i + 1} produces an £-run at i with probability . Similarly 
if there is an {£ — 2)c-run at position i and there are minuses at positions j — 1, j and j + 1, the 
run increases in length by 2 with probability p'^ if the edge {j — 1, j} is selected. Finally a fcc-run 
and an [£ — 2 — fc)c-run, 1 < A: < ^ — 3, at positions i and (z + fc + 2) respectively merge with 
probability p^, introducing an £c-vuii, if the edge between the runs, namely {i + k,i + k + 1}, is 
selected. Thus we have 



1 / 



3 

it 

(.-2-k 

k=l 



This could be written as 

1 / \ 

p;+i = + - -2P,* + 2P/+1 + 2p(l - p)PUP^ +p'y] PlPl2-k ■ (21) 



A;=0 



Here also, we have used the fact that the probability of finding three consecutive minuses is at 
most Pg. Observe that, in this form, the difference equation is equivalent to (PT|) when 
£ — 2. Therefore, we can use (|21l) for I!" = 2 also. 



A Oc-run. Finally, consider a run of length zero, i.e. a Oc-run. Recall that we have defined Pg 
and P* to be upper bounds on the probability of finding a Oc-run and Ic-run respectively, at 
position i at time t. Now let Pq and P* denote the exact values of these probabilities respectively, 
i.e. Pg = Pr(7?,Q) and Pf = 'Pt:{TZ\). We can now examine the dynamics of a Oc-run. A Oc-run 
at position i means there are minuses at positions {i — \) and i. Then, i + 1 can be a minus 
or a plus. It is not difficult to verify that, if it is a minus then the Oc-run might be deleted 
with probability (3p — p^)/n, and if it is a plus , the Oc-run might be deleted with probability 
{2p — p'^)/n. Also note that probability of finding each of these configurations is at most Pp. 
On the creation side, a Oc-run at i could be created from a Ic-run at i with probability 2/n and 
from any longer plus-runs at position i with probability By definition, the probability of 
finding a Ic-run at i is P*. It then follows that the probability of finding a plus-run of length 
greater than 2 at position i is (1 — Pg — P*) . Hence we obtain 

Pg*+i + \ i-Poi^P - P') - Poi^P - P') + 2A* + (1 - ^0 - Pi)) ■ 

Thus 

pr = p'. f 1 " i±^^i^) + 1(1 + pi). (22) 

Note that Pg*+^ in (|22l) is an upper bound. Furthermore, the coefiicient of Pq is positive when 
n> 1 -I- 5p — 2p^, and the coefficient of Pf is also positive. We can therefore replace Pg and Pf 
with their upper bounds Pg and Pf respectively, obtaining 

Pg*+1 = Pg* + 1 (- (1 + 5p - 2p2)p„* + Pl + 1) . (23) 



n 



Hence we get 



dP5 



{l + 5p- 2p^)PS + PI + 1 . (24) 
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3.2 The analysis 



In the previous section we modelled the game dynamics by a set of differential equations. We 
first solve the ones corresponding to the runs of length shorter than 3. 

Lemma 10. // the gam e is st arte d in t he a ll-minuses configuration, the solution to the system 
of differential equations (|18l) . (|20p and (|24p is given by 



'P5' 




PI 




PL 





l + (-4 + 3e-^+f 
(1-e 



(5e-2^ - 9e-^)T + 2e-2^T2 - 31e- 
+ (_| + 6e— - |e-2- - - 2e-^W^) p2 + o(p2) 



-o(p2) 



Proof. Note that the differential equations (|18l) and (IMl) are linear, while is nonlinear. 
Fortunately, we can approximately linearise (j20l) using some knowledge of the system. 

We approximate the solutions with error terms o(p^). Then, assuming Pq = I + 0{p), 
P[ = 0{p) and = o(p^) linearises ([20| . The linearised version is given by 



2p(l-p)Pi^-2P;+p2+o(p2 



Hence, for short runs, we have the following nonhomogeneous linear system of first order 
differential equations. 



dP 



dT 

dP[ 

dPi 

dT 



° = -(1 + 5p - 2/)Po" + Pi" + 1 
= 2p(l-p)Po"-2Pi" + 2P2- . 



= 2p(l-p)P;-2PJ+p2+o(p2) 



In matrix form, the system can be written as 



d_ 



Let us denote this system by 



'PS' 




"-(l + 5p-2p2) 1 0" 




'PS' 




1 


PI 




2p{l - p) -2 2 




PI 


+ 





PI 




2p{l - p) -2 




PI 




p2 + o(p2) 



P' = AP + F. 

Since the game is started with the all-minuses configuration, we have the initial condition 



(25) 



P(0)- 



Thus we have an initial value problem which we solve by the method of decoupling. We first 
find the eigenvalues and eigenvectors of A. The characteristic polynomial of A is 

+ (5 + 5p - 2p2)A2 + (8 + Up - 2p^)X + 4 + 12p - 2Qp^ + 28p^ - 8p^ . 

An analysis of this cubic polynomial shows that all three roots are real, different, and negative 
for small p. The eigenvalues of A are 



"Ai" 




-1 - 


3p- 


\- 14p2 -f 


-o{p') 




A2 




-2-2VP- 




n 3/2 

4 






As 




_-2 + 2Vp- 


-p- 


n 3/2 

4 f 


— - 


f o(p2 
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Now, the eigenvector of A corresponding to eigenvalue Ai is 



ei = 



- 2p-i + 6 + 0{p) 
ip-i-l + 6p + 0(p2) 

1 



1 

4p2 



l-8p + 24p2 + 0(p3) 
4p2 



The eigenvector corresponding to eigenvalue A2 is 



62 



- i + |pV2 _ 7p + ^p3/2 _ + ^p5/2 + 0(p3) 

1 



1 



,3/2 



4:^p2 _ 101^5/2 ^ o(p3) 



VP 



Finally, the eigenvector corresponding to eigenvalue A3 is 



63 



1 

Vp~ 
1 

VP 



3 
2 
1 
2 



3 _ _ 1001 3/2 _ 43 jj2 
8V^^ 2^ 128^ 2^ 



1 



^_Mp_13p3/2_^^2 



Mlp5/2 



VP 



We now form the matrix T whose columns are constant multiples of the eigenvectors of A. 
That is 

T = [ ip^ei y/pes ] . 

Since all three eigenvalues are different, the eigenvectors ei, 62, and 63 are linearly indepen- 
dent. Hence the matrix T is non-singular and exists. Let us calculate the determinant of 
T to confirm that the approximated T is non-singular. 

det T = -2Vp + f - ^ . 

Now, we calculate the inverse of the matrix T. Since the determinant of T is 0{p^^^) and T is 
accurate up to 0{p^^'^), will be correct up to O(p^). 



1 + 6p — 6p'^ + o{p'^) 

p-2p3/2 + fp2 + o(p2) 

o(p2) 



1 + 13p + 79p2 + o(p2) 

_l + 13p_2p3/2 + ^p2+„(p2) _J 



2 + 32p-|-226p2 + o(p2) 

|p-4p3/2 + ^p2+„(p2) 

j885 
612 ' 



32 

hp 



-4p3/2_^p2+„(p2) 



Ignoring the error terms, we can verify that T ^AT is a diagonal matrix whose 
diagonal elements are the eigenvalues of A, concurring with the theory. That is 



T-^AT = 



Ai 











A2 











A3 
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Let P = TY. Then we have a new system of differential equations given by 

y' = DY + G, (26) 
with initial condition Y(0) = T"^P(0) where D = T'^AT and G = T^^F. Hence 



Y(0) = T-^P(O) 
We also know that 



Thus 



1 + 6p — + o(p^) 
-p - 2p^/^ - ^p^ + o(p^ 





p^ + o{p^ 



1 + 6p — Ap^ + o{p'^) 
p - + ^p^ + o(p2) 



G = T^F 

Now, solving the three decoupled differential equations ([26]) yields 



-p - - ^p^ + o{p'^) 



1 + (3 + 3e-^)p + (1 - 7e-^ - 9e-^T)p2 + o(p2) 
(I + 5^-2-) p + (-e-2-r - |e-2- - |) p3/2 + (g-2r^ + + 29 ^-2. + g) p2 + o{p2) 



29 -2t 
16 '16 



Finally, the solution for ()25p can be computed using P = TY. What is remaining 
to be shown is that the three assumptions used in the proof are valid. They are: 

= o(p^), PI = 0{p) and Pq = 1 — 0{p). The assumption on is validated in 
Lemma [T2j Let us consider the other two here. The final solution confirms that our 
assumptions are valid at any time r if they were valid initially. Clearly the assumptions 
hold initially as, at time r = 0, we have Pq = 1 and Pf = 0. Hence, the final solution 
holds for any r and the proof is complete. □ 

We will use generating functions to solve the recurrence (|2ip . Let the function F{x,t) be 
defined by 



F{x,t)^Y.P'^ 

£=0 

Now, multiplying (PT|) by x^~^^ and summing over all ^ > 2, we obtain 

oo oo ^ oo oo 



£=2 



«=2 



e=2 



k=0 



+ 2p(i -p)y: pypy^' +p'y: x'^' e pyy-^ 

1=2 1=2 

The indices of ([27|) can be adjusted to get 

oo oo / oo ^ 

xY^pyx^ = xY^py+^-[-2xY^py + 2Y^pl 



(27) 



i=3 



2p(i - p)x^p* py +p"J2 T.pyi-' 



i=l 



k=0 



(28) 
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Note that the last term in (pS)) can be thought of as relating the sequence to its own 
convolution, thus can be replaced by their product, obtaining 



00 ^00 
2p{l p)x^Pl J2p'-'+ p'-' E ^n-" ) ( E Pn 



,0^ 
n=0 



Hence 

x{F{x,t + 1) - Po*+i - Pi+^x) = x{F{x,t) - Pi - Plx) + ^{-2x{F{x,t) - P^ - P^x) 
+ 2{F{x,t) - P* - Plx - P^x^) + 2p{l-p)x^Pl{F{x,t) - Pl)+p^x''F{x,tf) . 
This can be rearranged to get 

Fix, t + l)- Fix, t) ^ - P*) , (P*+i - P*) 



= -2x{Fix,t)-P*-Plx) 



+ 2{Fix,t) - P(5 - Plx - Pix^)+2pil-p)x^Pl{Fix,t) - Po*)) +p'^x^Fix,tf . 
Substituting (|15p and ([^5)) into the above equation yields 

^Fix^t + l)~Fix^ = p^x'Fix, tf + 2P(a;, t)il-x + x^P^pil - p)) 

n (29) 

- 2x^ pP fil -p) + Po{xil-5p + 2p^) + 2x^pil ~p)-2)-Plx + x . 

Now, let y(r) — Fix, t) where r = ^ as defined before. Then, approximating and rescaling, 
we get 



X 



^ = p^x^yirf +2yiT)il-x + x'P.^pil - p)) ^ 2x'P,^Ml - P) (3^) 
+ PJ'(a;(l - 5p + 2p2) + 2x'^pil - p) - 2) - Plx + x . 
The following lemma proves that y(r) has a radius of convergence greater than 1. 

Lemma 11. The generating function y(r) is hounded above and converges for some 
X > 1. 

Proof. Without loss of generality, let x = 1 + p^. Substituting this value into the 
differential equation (pOl) gives 



- p'yiTf + 2yiT)P^pil-p)-2P^^pil-p)- P^il + ?,p)- PI + l + oip') . (31) 



dT 



Differential Equation (I3ip is nonlinear. But, we can linearise this by assuming y(r) = 
l + 0(p). This assumption will be validated later. Under this assumption, the nonlinear 
term 

/y(r)2 =p^ + o(p2) . 

Then, substituting the solutions for Pq and PI from Lemma [10] into (jSip and sim- 
plifying, we get the following linear differential equation. 

Mil _ (2p - (10 - 2e-2- - 6e--)/)y(r) = (-2 - 2,e'^)p ^^^^ 
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This is a first order linear differential equation which could be solved by the method 
of integrating factor. The integrating factor is 

^(r) = e/^2p+(10-2e-2^-_6e-^)p2dT- _ ^_2rp+(10T+e-2^+6e-^)p2 

Using Taylor approximations, we can approximate the integrating factor and its 
inverse to get 

h{t) = 1 - 2rp + (lOr + It"^ + 6e"^ + e''^^)p'^ + o{p^), (33) 

and 

^ 1 + 2tp - (lOr - + 6e-^ + e'^^jp^ + o{p^) . (34) 



On multiplying (j32p by /^(t), we obtain 



By using the approximation of the integrating factor in (I33p . the above equation 
could be simplified to 

'^^y^^]^^^'^^ = (_3e-^ - 2)p + (16 + 4t + ISre"^ - 4re-2- - ITe'^^ + 4e-^)p2 + o(p2) . 
or 

Now let u be defined by 

u = (3e"^ + 2) - (16 + 4t + 15re~^ - 4re"2^ - 17e"^^ + 4e"^)p, 

such that 

d{y{T)n{T)) 2n 

= -pu + o{p j . 

cLt 

Let us now find the integral of u which we will need later. 

f udT = (-3e"^ + 2r) - (16r + 2t'^ - ISre"^ - 19e~^ + 2Te"^^ + ^e-'^^)p . 
Jo 



Now, suppose u = ri(l). Then we have 

d{y{rMr)) 



[1 + o{p))pu . 



dT 

Integrating both sides of this equation, we get 

y{T)^i{T) = - (1 + o(p))((-3e"^ + 2t)p - (16t + 2t'^ - ISre"^ - 19e~^ 

+ 2re-2- + f e-2-)p2) + C, ^^^^ 

where C is an arbitrary constant. We can determine the value of C using the initial 
condition y(0) = 1. Thus we have 

1 + + o{p^) = 'ip-^p^ + C . 

Hence, the initial condition will be satisfied if C = 1 — 3p+ + o(p^). Substituting 
this value and (j34p into ()35p and simplifying using Taylor approximations, we get the 
following solution. 

y(T) = 1 - 3(1 - e-^)p + (2e-2^T + ^e"^^ - Qe'W - 2he'^ + f ) + o(/) . 
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We can therefore conclude that while u = r2(l), ?/(t) cannot deviate much from the 
above solution. The solution is bounded above as required. It is easily verified that 
this solution agrees with our assumption that y(r) = 1 + 0(p). Since our assumption is 
valid initially, i.e. y{T = 0) = 1, the solution is valid for all r. The lemma is proved. □ 

We have just proved that the generating function F{x, t) converges when a; = 1 + 
p^. Before looking at the subsequent results, let us validate an assumption made in 
Lemma [TO] that = o(p^). 

Lemma 12. The assumption that = o(p^) is valid. In fact, we have = o{p'^) for 
all e>3. 

Proof. From Lemma [TOl we have 
P5 + P1+P2 = l-3(l-e-^)p+(2e-2^r + ^e-2^-9e-^r-25e-^ + f )p2 + o(p2) . (35) 
Now let g{T) = F{l,t). Then, from ([29]), we obtain 
dgir) 



dT 



p'girY + 2g{T)P^p{l - p) - 2pP^\l - p) - P^il + 2,p) -Pl + l. (37) 



Note that, by definition, g{T) is equal to the sum of the probability bounds P/. Now 
comparing the equations (f37l) and (f3T]l reveals that both g{T) and ^(t) are identical 
except some error terms in o{p^). It is then readily verified that the solution for g{T) 
will be identical to ^(t). Hence, from Lemma W\\ we have 

g{T) = 1 - 3(1 - e-^)p + (2e-2v + ^e'^" - De-'r - 256"" + f ) + o{p^) . (38) 

Notice that both (|36|) and (|38|) have the functions of the same order on the right 
hand side. Hence, the additional terms that are missing on the left hand side in (|36p 
must be of the order o{p^). That is, 

Y.pi = o{p% 

proving the Lemma. □ 

In Lemma [TTl we proved that the generating function F{x,t) converges when x = 
1 +p^. Hence, if £ is sufficiently large, the following holds. 

Plx' < 1, i.e. P/ < . 

Otherwise, there is an infinite sequence with P^ > which contributes an infinite 
amount to the sum, contradicting the lemma. Thus, for some constant 7 > 0, we have 

PI < (t4^' (39) 

for all i. Using this result, the following lemma proves that it takes exponential time 
before a plus-run of length 0(re) can be formed on the cycle. 

Lemma 13. The following statement fails with probability exponentially small in n: if 
the game is started with all minuses on the cycle, it would take exponential time before 
a plus-run of length n/A or longer can be created. 
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Proof. By definition, probability that a run of length n/4 starts at position z at a given 
time r is at most P^^^- As the game is started with a symmetrical configuration (i.e. 
all-minuses) the result at any time will be symmetrical too. Hence the probability that 
such a run exists at any position on the cycle's n positions at a given time r is equal 
to nP^i^. Finally, the probability of finding such a run at any position on the cycle at 
any time within T steps is at most 

TnP:„ . 

It has already been shown in ()39p that, when i is sufficiently large, 

Pi < (1^3)1 • 

Hence the probability that a run of length n/4 is created in T steps is at most 

TnP^I^ < Tn-fj^-^y;j^ . 

This probability is exponentially small whenever T is polynomially bounded. In other 
words, T has to be exponentially large before a run of length n/4 can appear on the 
cycle. Clearly, longer runs require even longer time, proving the lemma. □ 

Remark 4. As mentioned earlier, the discretisation error is 0(e~^/^). However, the 
analysis above has error terms o(jP). Thus, for small enough p, the former is insignifi- 
cant. 

Proof of Theorem [2} For the game to converge to all-cooperation, at some point 
in time, there must be a plus-run of length n/4 or longer. The result then follows from 
Lemma [T3j 

As the error in the analysis is o(p^), the value for p should be small enough so that 
o(p^) terms can be ignored. This completes the proof. □ 

Remark 5. SRP also shows behaviour similar to RP for small enough p. That is, we 
can prove that there exists a small enough p for which it takes exponential time for the 
evolution of cooperation for SRP. The same approach as the one used for RP can be 
used here. We performed the analysis in this way and found that it is predictably much 
simpler. 

4 Emergence of defection 

The case where p = is easy to analyse because p = implies that there is no ran- 
domness in the strategies. As mentioned before, both RP and SRP are equivalent when 
p = 0, thus the same analysis applies to both strategies. The transition diagram of the 
resultant strategy is shown in Figure HI Clearly, the process converges to all-minuses 
state if there are any minuses on the cycle at the beginning of the game. Theorem [3] 
computes the time it takes for this. 

Proof of Theorem [3} It is easy to check that it will take the longest to reach the 
absorbing state (all-minuses state) if there is only one minus, i.e. a singleton, on the 
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Figure 4: The transition diagram of RP and SRP when p = 0. 



cycle at the beginning of the game. Therefore, we can use this setting as the initial 
configuration for the worst case analysis. 

Note that at each step of the game, the probability of spreading minus to a neighbour 
is 2/n. Let Tj denote the number of steps it takes to go from i-minuses to (z+l)-minuses 
on the cycle. Thus we have 

Pr(r,=t)= 1-- 



n J n 

Clearly, Tj has geometric distribution with probability of success 2/n. Therefore E[rj] = 
n/2. Hence we get 

n—l n—1 , ^\ 

^r^i v^^r^i n n(n — 1) 

E[r] = ^E[r,] = ^-= ^ ^ V 

i=\ i=\ 

Let us now get a bound on the probability of getting large deviations from the mean 
E[r]. Let X* denote the event that the number of minuses was not increased in the 
first i trials. Then, 

Pr(X*) = (^1 - < . 

+ \ —fin log n -1 J 

If t = log n/2, then Pr(X^) < n = —p- But we know from the definition of X 
that 

l3nlogn\ ^ ^ I vt\ ^ 1 



Pr^T.>:-^J<Prm<^^. 

Thus, deviations of size ^^i^ are unlikely. In other words, Tj lies within the range 
0, ^" ^2 ^ " with high probability. Now, define a set of random variables Yi such that 
Yi = ^yfiog „ ■ Then, Y^ G [0, 1] with high probability. Also, we have 

2E[r] _ n-1 



E[y] 



/3n log n /3 log n 



As Yi, 12, ■ ■ ■ , ^ are independent random variables taking values in [0,1], we can apply 
Chernoff bound to get 



1 ,2 n-1 



Pr(y ^ [(1 zb e)E[y]]) < 2e"3^ . 
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Vn—l ' 



the following holds. 

Pr(y ^ [{I ± e)B[Y]]) < 2e-3^'°s" 



2 



It follows immediately that T lies within the range 



Thus we can conclude that T G 



n{n—l) 
2 



lb 0(n2 logn^ 



libe)E[T]] with high probability, 
with high probability. □ 



5 Experimental results 



Theorem [T] proves that cooperation emerges fast when p is high, and Theorem [2] shows 
that cooperation emerges exponentially slowly when p is small enough. As it is not 
clear what happens for p between these two ranges, we carried out an empirical study. 
The results of this study are presented in this section. 

5.1 Simulation model 

The experimental results presented in this paper were obtained from a computer pro- 
gram which we developed to simulate the IPD game played by agents arranged as the 
vertices of a cycle. This program takes the length of the cycle and a value for p as the 
input parameters and plays the game until cooperation emerges or the number of itera- 
tion reaches a predefined maximum, whichever happens first. The maximum number of 
iteration attempted is 43 x 10^. At each step of the game, an edge is chosen uniformly 
at random and the game is played by the associated agents based on RP. Experiments 
were performed in a homogeneous setting where all players on the cycle adopt the same 
strategy. In our experiments, the game was started with all players playing defect. 

When all agents on the cycle play RP, the time taken for reaching cooperation was 
measured in terms of the number of steps required and plotted against the values of p 



in Figure 5(a) For the cases where the all-cooperate state was not reached in 43 x 10^ 
steps, the number of cooperators were counted before abandoning the game and plotted 
against p in Figure [5 (b)[ Each data point in the graphs represents an average value of 
100 repetitions. 



5.2 Observations 



Figure 5(a) suggests that the absorption time decreases as p increases, which is to be 
expected from the definition of the strategy. The results also support our theoretical 
results that cooperation emerges quite fast for high p, and takes a very long time for 
low p. However there is a large gap between the minimum value of p that we proved 
to give fast convergence and the lowest p having relatively faster convergence. To be 



more precise, Figure 5(a) shows that the absorption time increases rapidly when p is in 
the region 0.5 — 0.6. In other words, the convergence is relatively much faster when p 
is greater than 0.6. Theorem [1] however, rigorously proves the fast convergence for RP 
only when p > 0.870. 

For small values of p, the emergence of cooperation took so long that we could 
not reliably measure the time. This substantiates our theoretical result that it takes 
exponential time for cooperation to emerge for small values of p. Interestingly, in Fig- 
ure 5(b) the proportion of the cooperators is seemingly about p. This can be explained 



intuitively as follows. When the game starts, all agents are defectors. Thereafter, every 
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(a) Timing curves for varying n, for (b) Graph showing the fraction of coop- 

higher p. erators on the cycle after 43 x 10*^ steps, 

for smaller p. 

Figure 5: Simulation results for RP when applied to the IPD on cycle with n vertices. 



one of them decides to cooperate with probabiUty p. These are exactly the ones we will 
see for smaller p, since their decision to cooperate will not lead to others cooperating. 

In summary, the absorption time is exponentially large whenp is in the region — 0.5. 
This drops considerably in the region 0.5 — 0.6 and is relatively small when p is greater 
than 0.6. The results suggest that there is a sharp "phase transition" in the region 
0.5-0.6. 

Remark 6. We carried out simulations for SRP as well, but the results are not included 
in this paper. The results obtained are quite similar to the results presented above for 
RP. The main difference is that the apparent phase transition happens when p is in the 
range 0.3 — 0.4 for SRP whereas it happens in the range 0.5 — 0.6 for RP. 



6 Conclusions and open problems 

We have proposed randomised improvements to the Pavlov strategy for the multiplayer 
Iterated Prisoner's Dilemma game. This gives two new strategies called RP (Rational 
Pavlov) and SRP (Simplified Rational Pavlov) with a parameter p. We have studied the 
rate of convergence of these strategies both rigorously and experimentally when used 
on the cycle for playing the IPD. We have presented a complete analysis for RP and 
briefly remarked upon similar results we obtained for SRP. 

Since a rational player would choose to minimise risk without affecting long term 
return, a player playing RP or SRP should choose the lowest possible p that guarantees 
fast convergence to cooperation. Our results provide evidence (both theoretical and 
empirical) that players can safely choose p = 0.870 for RP and p = 0.699 for SRP, and 
still achieve fast cooperation. We have also shown that cooperation emerges exponen- 
tially slow when p is small enough and defection emerges (fast) when p = 0, for both 
strategies. It is not clear what happens for intermediate p. Simulation results suggest 
that there is a sharp phase transition in this range. 

It remains as an open question whether the phase transition can be proved rigorously. 
Two other interesting open questions are: whether this process can be analysed on 
graphs other than cycles, and whether there are graphs with average degree greater 
than 2 where fast convergence to cooperation for RP and SRP occurs for any p. 
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